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Estimation  of  Parameters  in  the  Three-Parameter 
Latent  Trait  Model1*2*3 


Hariharan  Suaninatnan  and  Jan.ice  A.  Gifford 
University  of  Massachusetts,  Amherst 


ABSTRACT 


Two  methods  for  estimation  of  parameters  of  the  three-parameter 
logistic  model,  the  (Jrry  method  and  the  maximum  likelihood  procedure, 
were  studied  with  respect  to  several  issues  using  artificial  data. 
Comparisons  were  made  as  to  the  accuracy  of  estimation  and  its  rela¬ 
tionship  to  the  number  of  items  and  examinees,  the  effect  of  the 
distributions  of  ability  on  the  resulting  estimates  of  items  and 
ability  parameters,  and  the  statistical  properties  such  as  bias  and 
consistency  of  the  resulting  estimates. 


‘The  project  was  performed  pursuant  to  a  contract  from  the  United 
States  Air  Force  Office  of  Scientific  Research.  However,  the  opinions 
expressed  here  do  not  necessarily  reflect  their  position  or  policy,  and 
no  official  endorsement  by  the  Air  Force  should  be  inferred. 
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Estimation  of  Parameters  in  the  Three-Parameter 
Latent  Trait  Model 


Hariharan  S'scvrinathan 
Janice  A.  Gifford 

University  of  Massachusetts ,  Anherst 


The  successful  application  of  latent  trait  theory  to  practical 
measurement  problems  hinges  upon  the  availability  of  procedures  for 
the  estimation  of  the  parameters.  Hence,  invest igations of  the  adequacy 
cf  tha  available  piuced  ures  for  estimating  parameters  in  latent  trait 
models  are  necessary  and,  indeed,  play  a  crucial  role  when  assessing 
the  usefulness  of  latent  trait  theory. 


While  the  problem  of  estimating  parameters  in  the  one-parameter 
latent  trait  model  appears  to  be  solved,  some  degree  of  controversy 
seems  to  surround  the  estimation  of  parameters  in  the  two-  and  three- 
parameter  models  (Wright,  1977;  Andersen,  1973).  Lord  (1975)  has 
empirically  evaluated  the  maximum  likelihood  procedure  for  estimating 
the  parameters  in  the  three-parameter  model  and  has  provided  answers  to 
some  of  the  questions  that  arise  with  respect  to  estimation  of  parameters. 
Jensema  (1976)  has  compared  the  eff iciency  of  a  heuristic  procedure  sug¬ 
gested  by  Urry  (1974)  for  estimating  the  parameters  in  the  three-parameter 


model  with  the  maximum  likelihood  procedure.  Despite  these  efforts,  little( 
is  known  regarding  the  properties  of  the  estimators  in  the  three-parameter 


model  and  the  effect  on  the  estimates  of  violating  the  underlying  assump¬ 
tions,  especially  with  respect  to  the  revised  heuristic  procedure  as 


suggested  by  Urry  (1976) 


L’l  t'.ribut  ton/ 
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The  purpose  of  this  study  is  to  investigate  the  efficiency  of  the 
maximum  likelihood  procedure  and  the  Urry  method  (Urry,  1976)  for  esti¬ 
mating  parameters  in  the  three-parameter  model,  to  study  the  properties 
of  the  estimators,  and  to  provide  some  guidelines  regarding  the  condi¬ 
tions  under  which  they  should  be  employed.  In  particular,  the  issues 
investigated  are:  (1)  the  "accuracy"  of  the  two  estimation  procedures, 
(2)  the  relationship  between  the  number  of  items,  examinees  and  the 
accuracy  of  estimation,  (3)  the  effect  of  the  distribution  of  ability  on 
the  estimates  of  item  and  ability  parameters,  and,  (4)  the  statistical 
properties,  such  as  bias  and  consistency  of  the  estimators. 

Design  of  the  Study 

In  order  to  investigate  t^e  issues  mentioned  above,  artificial 
data  were  generated  according  to  the  three-parameter  logistic  model 

[1]  P^O)  =  ci  +  0--ci)  {1  +  exp  [-1 . 7  ai(6j-bi)]} 

using  the  DATGEN  program  of  Hambleton  and  Rovinelli  (1973).  Data  were 
generated  to  simulate  various  testing  situations  by  varying  the  test 
length,  the  number  of  examinees,  and  the  ability  <  .  ,itr  .‘bution  of  the 
examinees.  Test  lengths  were  fixed  at  10  items,  15  items,  20  items,  and 
80  items.  Since  the  accuracy  of  the  maximum  likelihood  estimation  with 
large  numbers  of  items  has  been  sufficiently  documented  by  Lord  (1975), 

vi th  small  numbers  of  items,  10,  15,  and  20,  were  chosen  r?  that 
the  accuracy  of  the  estimation  procedure  can  be  ascertained  for  short 
tests.  This  is  particularly  important  if  latent  trait  theory  is  to  be 
applied  to  critt^.  o ..uxcr ly  ,  Lue  s  i  <£  e  s  of 
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examinee  population  wore  set  at  50,  200,  and  1000,  in  order  to  study 
the  effect  of  snail  sample  size  on  the  accuracy  of  estimation. 

In  the  Urry  estimation  procedure,  the  relationships  that  exist 
for  item  discrimination  and  item  difficulty  between  the  latent  trait 
theory  parameters  and  the  classical  item  parameters,  are  exploited 
(Urry,  1976;  Lord  &  Novick,  1968,  pp.  376-37S) .  These  relationships 
are  derived  under  the  assumption  that  the  ability  is  normally  distri¬ 
buted  and  that  the  item  characteristic  curve  is  the  normal  ogive.  In 
order  to  study  how  the  departures  from  the  assumption  of  normally  dis¬ 
tributed  abilities  affect  the  Urry  procedure,  three  ability  distributions 
were  considered:  the  normal,  the  uniform,  and  a  negatively  skewed  dis¬ 
tribution.  The  normal  and  the  uniform  distributions  were  generated  with 
mean  zero  an^  variance  unity  (the  uniform  distribution  was  generated  on 
the  Interval  (-1.73,  1.73]  to  ensure  unit  variance).  A  Beta  distribution 
with  parameters  5  and  1.5  was  generated  to  simulate  a  negatively  skewed 
distribution,  and  then  rescaled  so  that  the  mean  was  zero  and  the  vari¬ 
ance  unity.  The  distributions  were  standardized  so  as  to  remove  the 
effect  of  scaling  on  the  estimates  of  the  parameters. 

The  three  factors,  test  length  (4  levels),  examinee  population 
size  (3  levels),  and  ability  distribution  (3  levels)  were  completely 
crossed  to  simulate  36  testing  situations.  Test  data  arising  from  these 
situations  were  subjected  to  the  Urry  estimation  procedure  using  the 
computer  progams  ANTILLES  (developed  at  the  U.S.  Civil  Service  Commission) 
and  the  maximum  likelihood  estimation  procedure  using  the  computer  pro- 
LOGIST  (Wood,  Wingersky  &  Lord,  1976). 
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Lord  (1975)  has  emphasized  the  fact  that  simulated  c.ata  should, 
in  some  way,  resemble  real  data.  Otherwise  results  obtained  through 
simulation  studies  will  not  generalize  to  real  situations.  Given  this, 
an  attempt  was  made  to  generate  test  data  as  realistically  as  possible. 

In  order  to  accomplish  this,  item  difficulty  parameters,  b^ ,  were  sampled 
from  a  uniform  distribution  defined  on  the  interval  [-2.0,  2.0],  and  item 
discrimination  parameters,  a^;  were  sampled  from  a  uniform  distribution 
on  the  interval  [.6,  2.0].  Since  data  were  generated  to  simulate  item 
responses  to  multiple  choice  items  with  four  choices,  c^,  the  pseudo-chance 
level  parameters , were  set  at  .25.  It  should  be  noted,  however,  that  this 
does  not  ensure  close  approximation  of  the  generated  data  to  real  data. 
Combinations  of  item  difficulty  and  discrimination  that  may  not  occur  in 
constructed  tests  may  occur  with  simulated  tests  and,  hence,  affect  the 
estimation  procedures,  limiting  the  generalizability  of  the  findings  in 
simulated  studies  to  real  situations.  On  the  other  hand,  since  the  pur¬ 
pose  of  this  study  is  to  compare  two  estimation  procedures,  and  to  study 
the  statistical  properties  of  estimators,  the  possible  lack  of  corres¬ 
pondence  between  simulated  and  real  data  may  not  be  a  serious  problem. 

Results 


Accuracy  of  Estimation 

Comparisons  between  the  Urry  procedure  and  the  maximum  likelihood 
procedure  across  various  test  lengths,  examinee  population  sizes,  and 
ability  distributions  are  indicated  in  Tables  1,  2,  and  3.  The  statistics 
reported  are:  (i)  the  mean,  p,  of  the  population  item  parameters  for  each 
population  size,  (ii)  the  mean,  X,  of  the  estimated  item  parameters,  and. 


Comparison  of  Estimates  of  Item  and  Ability  Parameters  of  Ibe  l.oglst  Procedure 
with  the  Urry  Procedure  Based  on  Normal  Distribution  of  Ability 


Comparison  of  Estimates  of  Item  and  AMI  tty  Parameter  a  of  tin*  legist  Procedure 
with  the  Utry  Procedure  Rused  on  a  Skewed  Dlat  rlbut  Ion  of  Ability 


Comparison  of  Estimates  of  Item  and  At.llliy  rarnmotora  of  the  Legist  Procedure 
with  tin-  Urry  Procedure  Based  on  a  Uniform  Distribution  of  Ability 
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(iii)  the  correlation,  a,  between  the  true  parameters  and  their  esti¬ 
mates.  These  statistics  are  reported  for  both  the  estimates  obtained 
by  employing  the  Urry  procedure  and  the  maximum-likelihood  procedure. 

A  comparison  of  the  mean  of  the  generated  item  parameters,  p, 
and  the  mean  of  the  estimates,  X,  for  each  of  the  item  parameters, 
discrimination,  difficulty,  pseudo-chance  level  and  the  ability  param¬ 
eters,  provides  some  indication  of  the  accuracy  of  estimation.  However, 
this  comparison  is  rather  weak  when  carried  out  alone  since  the  means 
do  not  contain  all  the  essential  information.  Simul taneous  comparisons 
of  the  means,  and  examination  of  the  correlations  between  the  parameters 
and  estimates,  on  the  other  hand,  provide  valid  information  regarding  th 
accuracy  of  estimation.  If  the  correlation  is  high,  and  the  means 
differ,  then  it  can  be  concluded  that  the  estimation  was  not  sufficient  I 
accurate . 

Lord  (1975)  has  implied  that  if  heteroscedast i city  exists,  it 
may  not  be  meaningful  to  compute  correlations  betv;een  true  and  estimated 
values.  We  agree  with  this,  in  general.  However,  since  in  the  strict 
sense,  heteroscedast icity  will  invalidate  the  computation  of  least- 
squares  regression  line  (the  more  appropriate  criterion  to  employ  is  the 
generalized  least-squares  criterion),  and  hence  rule  out  the  use  of 
simple,  interpretable  statistic  for  the  evaluation  of  the  accuracy  of 
estimation,  heteroscedasticity  (when  it  occurred)  was  ignored  and 
correlations  and  least-squares  regression  equations  were  computed. 
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Estimat  1  on  of  Discr  i  m  1  n at  i on  _Pa r anon •  r 

Examination  of  the  results  given  in  Tables  1,  2,  and  3  indicates 
that  the  discrimination  parameter  is  poorly  estimated  for  short  tests. 

The  highest  correlation  between  true  values  and  estimates  for  a  test 
with  ten  items  and  normally  distributed  ability  is  .36,  with  the  mean 
of  the  estimates  exceeding  the  mean  of  the  true  values.  The  correlations 
do  improve  with  increasing  sample  size  and  test  length,  with  the  mean 
of  the  estimated  values  approaching  the  mean  of  the  true  values  from 
above.  The  highest  correlation  between  the  estimated  and  true  values  is 
.88  for  an  50  item  test  with  1000  examinees.  This  trend  is  aiso  evident 
for  uniform  and  skewed  distributions  of  ability.  in  general,  the  dis¬ 
crimination  parameter  is  poorly  estimated  by  rhe  Urry  procedure,  with 
the  estimation  improving  more  rapidly  with  increasing  test  length  than 
with  increasing  examinee  population  size. 

The  least-squares  regression  lines  (for  normally  distributed  ability) 
for  predicting  the  estimates  from  true  values  given  in  Table  4,  were 
plotted  (not  shown)  and  compared  with  the  line  y=x,  in  order  to  determine 
the  extent  of  the  bias  in  estimation.  The  regression  lines  for  all  the 
test  length — sample-size  combinac ions  fell  above  the  line  y-x,  indicating 
that  the  Urry  procedure  systematically  overestimates  the  discrimination 
parameter,  with  the  regression  lines  approaching  the  line  y=x  with  in¬ 
creasing  test  length.  Again  the  "convergence"  to  the  line  v=x  was  more 
rapid  with  increasing  test  length  than  with  increasing  sample  size. 

Trends  similar  to  that  observed  with  the  Urry  procedure  were  aiso 
observed  with  the  maximum  likelihood  procedure.  Although  the  estimation 
of  discrimination  was  poor,  the  maximum  likelihood  estimates  were  con¬ 
sistently  better  than  the  "Urry  estimates”  in  that  the  correlations  between 


Regression  Coefficients  and  Stnndmd  Errors  for  Fred  cling  t  lie  1st  Invite 
f  com  True  Values  B  a  tied  on  Nonna  l  D 1  ?>  t  r  i  t>u  t  1  on  of  Ability 
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true  values  and  estinaces  were  higher,  and  Che  means  of  Che  estimaces 
were  much  closer  to  the  means  of  the  true  values.  Comparison  of  the 
plots  of  the  regression  lines  given  in  Table  4  with  the  line  y=x, 
shoved  that  while  there  was  a  general  cendencv  for  the  parameters  to  be 
overestimated,  this  tendency  was  not  as  marked  as  with  the  L'rry  procedure; 
the  "convergence"  of  the  regression  lines  to  the  line  v=x  was  more  rapid. 
These  trends,  higher  correlations  between  true  and  estimated  values  than 
the  l'rry  estimates,  tendency  for  the  means  of  the  estimates  to  be  closer 
to  the  means  of  the  true  values,  and  rapidity  of  "convergence"  of  the 
regression  line  to  the  line  y=x,  were  also  observed  with  the  uniform 
and  skewed  distribution  of  ability. 

Estimation  of  Difficulty  Parameter 

The  Urry  procedure  was  extremely  successful  in  providing  accurate 
estimates  of  the  difficulty  parameter.  The  correlations  between  estimates 
and  true  values  ranged  from  .85  to  .99.  Comparison  of  the  regression 
lines  for  normally  distributed  ability  given  in  Table  4  with  the  line  y=x 
indicated  that  except  for  tests  with  10  items,  the  difficulty  parameter 
was  generally  overestimated  for  tests  with  15  and  20  items.  With  larger 
numbers  of  items,  there  was  a  tendency  for  difficult  items  to  be  over¬ 
estimated  and  for  easy  items  to  be  underestimated.  However,  the  bias  was 
slight  in  that  with  increasing  items  and  sample  size,  the  convergence  of 
the  regression  line  to  the  line  y=x  was  rapid. 

The  maximum  likelihood  estimates  of  the  difficulty  parameters  were, 
in  general,  better  than  the  estimates  produced  by  the  Urry  procedure.  The 
correlations  between  true  and  estimated  values  ranged  from  .88  to  1.00 
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(the  Urry  proceaure  yielded  correlations  ranging  from  .85  to  .99).  The 
means  of  the  estimates  were,  in  general,  closer  to  the  means  of  the  true 
values  than  they  were  with  the  Urry  procedure.  Comparisons  of  the  re¬ 
gression  lines,  given  in  Table  4,  with  the  line  y=x,  revealed  that  with  increas 
ing  test  length  and  increasing  sample  size,  the  regression  line  approached  the 
line  y=x  rather  rapidly,  demonstrating  that  there  was  no  bias  in  the 
estimation.  No  clear  trends  were  visible  with  10,  15,  and  20  items, 
although  the  test  with  10  items  and  50  examinees  produced  overestimates 
of  the  difficulty  parameter.  These  results  appeareu  to  hold  for  both 
uniform  and  skewed  distributions  of  ability,  although  with  the  skewed 
distribution  there  were  two  instances  when  the  estimates  of  difficulty 
went  out  of  bounds.  These  cases  are  indicate^  with  an  asterisk  in 
Table  2.  However,  with  80  items  and  1U00  examinees,  the  agreement  between 
estimated  values  and  true  val ues  was  comparable  to  that  obtained  with 
normally  distributed  ability. 

In  general,  the  difficulty  parameter  was  estimated  rather  well  by 
both  maximum  likelihood  and  Urry  procedures.  The  maximum  likelihood 
procedure  fared  surprisingly  well  with  small  numbers  of  items  and 
examinees  in  comparison  with  the  Urry  procedure,  and  in  general  produced 
better  estimates  (as  determined  by  the  correlations)  than  the  Urry 
procedure . 

Chance-Level  Parameter 

The  true  value  of  the  chance-level  parameter,  c^,  was  set  at  .25 
for  all  the  items.  Given  this  lack  of  variation  among  the  true  values, 
correlations  between  estimates  and  true  values  were  not  computed.  Hence, 
only  the  mean  of  the  true  values,  the  mean  of  the  estimates,  and  the 
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standard  deviation  of  the  estimates  are  reported  m  Tables  1,  2,  and  3. 

The  Urry  procedure  clearly  produced  very  poor  estimates  of  the 
chance-level  parameter.  The  means  of  the  estimates  were  consistently 
higher  than  the  mean  of  the  true  values,  with  relatively  large  standard 
deviations.  Maximum  likelihood  estimates,  on  the  other  hand,  were  close 
to  the  true  values  with  small  standard  deviations.  The  mean  maximum 
likelihood  estimates  ranged  from  .12  to  .25  for  normally  distributed 
ability,  from  .19  to  .25  for  skewed  distribution  of  ability,  and  from 
.18  to  .25  for  uniformly  distributed  ability.  In  comparison,  the  Urry 
procedure  yielded  estimates  that  ranged  from  .20  to  .36,  .20  to  .56, 
and  from  .22  to  . 46,  respectively ,  for  the  three  distributions  of  ability. 

Estimation  of  Ability 

An  examination  of  Tables  1,  2,  and  3  indicates  a  consistent  pattern 
in  the  estimation  of  abilities  for  both  maximum  likelihood  and  Urry  pro¬ 
cedures.  The  correlations  between  true  values  and  estimates  do  not  seem 
to  be  affected  by  increasing  sample  sizes  for  fixed  test  lengths.  On  the 
other  hand,  increasing  the  lengths  of  the  test  greatly  affect  the  magnitude 
of  tne  agreement  uetween  true  values  and  estimates.  This,  not  surprising, 
trend  holds  for  the  three  distributions  of  ability. 

In  genera],  it  appears  that  although  no  differences  exist  between 
the  "Urry  estimates"  and  the  maximum  likelihood  estimates  of  ability  for 
tests  with  15  items  or  more,  the  maxinurr.  likelihood  estimates  fare  better 
than  the  "Urry  estimates"  for  short  tests  with  10  items.  This  effect  is 
more  pronounced  with  the  skewed  ability  distribution. 

A  closer  examination  of  the  two  estimates  carried  out  by  comparing 
the  regression  1 ines,  obtained  by  regressing  the  estimates  on  the  true 
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values  with  che  line  y=x,  indicaces  chat  Che  Urry  procedure,  in  general, 
underesc imaces  che  abilicies  of  examinees wich  high  crue  abilicies  and  over- 
estimates  che  abilities  of  examinees  with  low  true  abilities.  This  may 
partly  be  attribuced  to  the  fact  chat  the  chance-level  parameters  are 
overestimated.  No  such  trends  were  evident  with  the  maximum  likelihood 
estimates.  These  regression  lines  rapidly  converged  to  the  line  y=x  with 
increasing  test  length. 

Ef f oct  of  Ability  Distribution 

As  pointed  out  earlier,  the  Urry  procedure  exploits  the  relationships 
that  exist  between  the  classical  item  parameters  and  the  parameters  of 
the  latent  trait  model.  These  relationships  are  derived  under  the  assump¬ 
tion  that  ability  is  normally  distributed  and  that  the  item  characteristic 
curve  is  the  normal  ogive.  In  order  to  investigate  the  effect  on  the 
estimates  of  departures  from  normality,  three  distributions  of  ability, 
the  normal,  uniform,  and  a  Beta  with  parameters  5  and  1.5  to  simulate  a 
skewed  distribution,  were  generated,  and  the  parameters  estimated.  A 
X2  test  was  carried  out  to  determine  if  the  uniform  and  the  Beta  distri¬ 
butions  deviated  sufficiently  from  the  normal.  The  Beta  distribution 
yielded  a  x2  value  of  63.5  when  the  tails  of  the  normal  distribution  were 
excluded  and  a  value  of  193.1  when  the  tails  were  included.  The  uniform 
distribution  yielded  a  x2  value  of  69.6  when  tails  were  excluded  and  307.7 
when  the  tails  were  included.  This  indicates  that  both  distributions 
deviated  sufficiently  from  the  normal,  with  the  uniform  distribution 


deviating  even  more  than  the  Beta  distribution. 
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Comparisons  of  the  results  in  Tobies  1,  2,  and  3  reveal  that,  in 
general,  the  Beta  distribution  affected  both  estimation  procedures,  while 
the  uniform  distribution  produced  results  similar  to  those  obtained 
using  a  normal  ability  distribution.  Although  the  Eeta  distribution 
affected  the  estimation  of  discrimination  for  both  procedures,  and 
chance-level  and  ability  for  the  Urry  procedure ,  the  estimation  of 
difficulty  did  not  seen  to  be  affected  in  either  case.  The  Urry 
procedure  fared  poorly  with  the  skewed  distribution  in  comparison  to 
the  maximum  likelihood  procedure  in  the  estimation  of  the  discrimination, 
chance-level,  and  ability  parameters. 

The  estimates  for  the  discrimination  parameter,  resulting  from 
both  procedures,  were  negatively  correlated  with  the  true  values  for 
short  tests.  For  longer  tests,  although  estimates  from  both  procedures 
improved,  the  L’rrv  procedure  produced  poor  estimates  in  comparison  to 
the  maximum  likelihood  procedure.  For  an  eighty  item  test  with  1000 
examinees,  a  correlation  of  .68  was  obtained  using  the  Urry  procedure, 
as  compared  to  a  correlation  of  .82  obtained  from  the  maximum-likelihood 
procedure . 

The  estimates  of  the  chance-level  parameters,  resulting  from  the 
Urry  procedure  were  extremely  high  for  all  tests  except  those  of  80 
items.  The  mean  values  ranged  from  .20  to  .56  with  the  Beta  distribution 
as  compared  to  a  range  of  .20  to  .36  for  the  normal  distribution  of 
ability.  The  maximum  likelihood  estimates,  on  the  other  hand,  were 
underestimated  but  comparable  to  those  obtained  using  a  normal  distribution 
of  ability. 
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T'ne  maxinun  likelihood  estimates  of  ability,  resulting  from  using 
a  skewed  distribution  of  ability,  were  as  good  as,  and  in  some  cases 
better  than,  the  estimates  obtained  with  a  normal  distribution.  In 
contrast,  the  Urrv  procedure,  with  a  skewed  distribution,  resulted  in 
poorer  estimates.  This  effect  held  true  even  as  sample  size  and  test 
length  increased. 

In  summary,  the  "Urry  estimates"  cf  ability,  discrimination,  and 
chance-level  parameters  seemed  to  he  affected  more  dramatically  than 
the  maximum  likelihood  estimates , when  ability  had  a  skewed  distribution. 
It  should  be  noted  that  although  the  uniform  distribution  had  =>  lar-opr 
X2  value  than  the  Beta  distribution,  the  results  obtained  with  the 
uniform  distribution  of  ability  were  similar  to  chose  obtained  with  the 
normal  distribution.  It  is,  then,  not  departures  from  normality, 
but  departures  from  symmetry,  and  the  unavailability  of  examinees  in 
the  lower  tail  of  the  ability  distribution  that  affected  the  estimation 
procedure . 
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Statistlcal  Properties  of  Estimation 

Bias .  If  g  is  an  estimator  of  y,  then  g  is  an  unbiased  estimator 
of  y  if 

E(o)  =  y, 

where  E(‘)  is  the  expectation  operator.  This  is  a  desirable  property  of 
es  t imators . 

Schmidt  (1977)  has  pointed  out  that  the  Urry  method  based  on  the 
procedure  developed  by  Urry  (1974)  systematically  overestimates  the  dis¬ 
crimination  parameter  and  underestimates  the  difficulty  parameter.  Urry 
(1976)  has  suggested  a  correction  for  this  and  has  incorporated  this 
into  the  modified  Urry  procedure  employed  to  estimate  parameters  in  this 
study.  Since  it  appears  that  for  large  numbers  of  items  and  examinees 
the  estimates  are  unbiased  (Lord,  1975),  in  order  to  study  the  effect  of 
this  correction  on  the  estimates,  and  to  examine  if  the  maximum  likeli¬ 
hood  estimates  are  unbiased,  a  relatively  short  test  (20  items)  with  200 
examinees  was  selected,  response  data  generated,  Item  parameters  esti¬ 
mated,  and  replicated  20  times.  Since  the  replications  were  obtained 
by  generating  sets  of  random  examinees,  the  bias  in  the  estimator  of 
ability  was  not  investigated. 

The  results  of  the  replications  are  presented  in  Table  5  where  the 
true  value,  p,  of  the  20  item  parameters  are  given  together  with  the  mean 
estimate,  X,  of  the  item  parameter  over  20  replications.  The  standard 
error,  and  the  t  value  obtained  as 

t  =  (X  -  u)/SE 

are  also  given  to  indicate  the  degree  of  departure  of  the  mean  estimate 


from  the  true  value. 
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The  Urry  procedure  clearly  overestimates  the  discrimination  param¬ 
eter  as  does  the  maximum  likelihood  procedure-  However,  the  bias  in 
the  maximum  likelihood  estimates  does  not  appear  to  be  as  severe  as  the 
bias  in  Che  Urry  estimates.  This  finding  is  borne  out  in  Figure  1 
where  the  regression  line  for  predicting  X  from  u  is  plotted  for  both 
Urry  and  maximum  likelihood  procedures  and  compared  with  the  line  y=x. 

The  maximum  likelihood  regression  line  is  closer  to  the  line  y=x  and 
shows  that  small  values  of  d iscriminat ion  are  overestimated  while  very 
large  values  tend  to  be  estimated  accurately,  partly  due  to  the  fact  that 
an  upper  limit  was  imposed  on  the  estimates.  On  the  other  hand,  the 
Urry  procedure  tends  to  overestimate  large  values  even  more  than  small 
values  of  discrimination. 

With  item  difficulty,  the  maximum  likelihood  procedure  tends  to  under¬ 
estimate  easy  items,  while  producing  relatively  accurate  estimates  of 
very  difficult  items  (Figure  2).  The  Urry  procedure,  on  the  other  hand, 
tends  to  overestimate  items  with  large  difficulty  levels  and  underesti¬ 
mate  items  with  negative  difficulty  levels.  In  general,  the  Urry  pro¬ 
cedure  seems  to  produce  biased  estimates  of  item  difficulty  throughout 
the  entire  range. 

Consistency .  If  gn  is  an  estimator  of  y ,  gn  is  a  consistent 
estimator  of  y  if  for  any  positive  e  and  n  there  is  some  N  such  that 

Prob  {|gn  -  y|<e}>l-n,  n>  N. 

Consistency  is  a  desirable  property  in  that  it  ensures  that  an  estimator 
tends  to  a  definite  quantity  which  is  the  true  value  to  be  estimated. 

The  problem  of  consistency  has  raised  several  questions  concerning  the 
estimation  of  parameters  in  the  latent  trait  models.  Andersen  (1972)  has 
argued  that  a  consistent  estimator  of  the  discrimination  parameter  does  not 


Figure  1.  Bias  in  the  estimation  of  the  discrimination  parameter. 
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exist  and  hence  has  questioned  the  neaningfulness  of  the  two-  and  thi co¬ 
parameter  models. 

In  order  to  investigate  whether  or  not  the  maximum  likelihood 
estimators  and  the  "Urry  estimators"  are  consistent,  the  regression 
equation  for  predicting  the  estimates  from  the  true  values  of  the  various  par, 
eters  were  examined.  Since  the  definition  for  a  consistent  estimator 
given  earlier  implies  that  an  estimator  is  consistent  if  (i)  it  is 
a sympo t ica 1 ly  unbiased,  and  (ii)  its  variance  tends  to  zero  with  in¬ 
creasing  sample  size,  in  order  for  the  estimators  of  the  latent  trait 
parameters  to  be  consistent,  (i)  the  slope  of  the  regression  equation 
must  approach  one  and  the  intercept  approach,  zero,  (ii)  the  variance, 
and  hence,  the  standard  errors  of  the  estimate  of  the  slope  and  inter¬ 
cept  must  approach  zero.  If  these  conditions  are  met  then  the  estimator 
is  consistent. 

The  regression  coefficients  and  the  standard  errors  are  reported 
in  Table  4.  The  results  reported  indicate  that  when  both  the  number  of  items 
and  the  number  of  examinees  increase,  the  slope  and  intercept  coeff ir fonts 
approach  one  and  zero  respectively,  with  the  standard  errors  approach  ire: 
zero.  This  tendency  is  evident  for  both  Urry  and  maximum  likelihood 
estimators  for  the  discrimination  parameter,  difficulty  parameter, 
chance-level  parameter  and  the  ability  parameter.  In  all  these  canes, 
the  maximum  likelihood  estimator  converges  in  probability  to  the  true 
value  more  rapidly  than  the  Urry  estimator.  It  should  be  pointed  out, 
however,  that  the  results  reported  nere  do  not  eonclusivelv  support  this. 

It  is  clearly  necessary  to  examine  the  standard  errors  and  the  regros >; ion 
coefficients  with  a  greater  number  of  items  and  examinees. 
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DISCUSSION 

The  purpose  of  this  study  war;  to  compare  two  methods  for  estimation 
of  parameters  in  the  three-parameter  logistic  model,  the  Urry  method  of 
estimation  and  the  maximum  likelihood  procedure.  The  computer  programs 
that  were  used  to  carry  out  this  study  were  the  ANCILLES  program  and 
the  LOG I ST  program  (Wood,  Wingersky,  &  Lord,  1976) .  The  efficiency  of 
the  procedure:;  were  compared  with  respect  to  the  accuracy  of  estimation, 
the  effect  of  violating  underlying  assumptions  (for  the  L'rrv  procedure), 
and  the  statistical  properties  of  the  estimators.  The  factors  that  wore 
controlled  were:  test  length  (V  levels),  examinee  population  sine  (3 
levels)  and  ability  distribution  (3  levels). 

The  results  indicate  that,  in  general,  the  maximum  likelihood  pro¬ 
cedure  is  superior  to  the  Urry  procedure  with  respect  to  the  estimation 
of  all  item  and  ability  parameters.  The  differences  were  pronounced  in. 
the  estimation  of  the  discrimination  and  chance-level  parameters,  -while 
with  respect  to  the  estimation  of  ability  and  difficulty  parameters, 
the  differences  wore  less  remarkable.  Differing  ability  distributions 
hid  little  effect  on  the  estimation  of  difficulty  and  ability  parameters. 
However,  with  a  skewed  distribution  of  ability,  the  Urry  procedure  pro¬ 
duced  poorer  estimates  of  discrimination  and  chance-level  parameters  than 
with  normal  or  uniform  ability  distributions.  The  maximum  likelihood 
procedure,  although  faring  better  than  the  Urry  procedure  (with  the 
exception  of  the  10  item  test),  produced  slightly  poorer  results  wit!; 
the  skewed  distribution  than  the  normal  or  uniform  distribution. 

The  number  of  examinees  had  a  slight  effect  in  improving  the 
accuracy  of  estimation  of  the  difficulty,  and  the  chance-level  and  ability 
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parameters.  However,  increasing  the  number  ut  items  and  the  number  of 
examinees  considerably  improved  the  accuracy  of  the  discrimination 
estimates  with  both  procedures.  Surprisingly  enough,  a  twenty-item 
test  with  1000  examinees  produced  excellent  estimates  of  ti.e  difficulty 
and  chance-level  parameters,  and  reasonably  good  estimates  of  the  dis- 
crinin  tion  and  ability  parameters.  Tests  with  80  items  and  1000  people 
fared  considerably  better,  providing  good  estimates  of  all  parameters. 
Testswith  15  items  or  less  while  yielding  good  estimates  of  difficulty 
and  chance- level  parameters,  and  reasonable  estimates  of  abi1 ity  param¬ 
eters,  yielded  poor  estimates  of  the  discrimination  parameter.  This 
severely  limits  the  application  of  the  three  parameter  latent  trait 
model  to  criterion-referenced  measurement  situations  since  criterion- 
referenced  tests  typically  have  fewer  than  10  items.  However,  it  should 
be  pointed  out  that  this  limitation  exists  only  if  the  item  parameters 
and  ability  parameters  are  estimated  simultaneously.  If  item  banks 
with  known  item  characteristics  are  employed  to  estimate  ability,  or 
if  the  Rasch  model  is  employed,  this  limitation  may  not  exist. 

Although  the  maximum  likelihood  estimates  were  superior  to  the  Urry 
estimates,  especially  in  the  case  of  short  tests,  the  difference  between 
thorn  was  negligible  when  the  number  of  items  and  the  number  of  examinees 
increased.  This  is  of  particular  importance,  since  the  Urry  procedure 
requires  considerably  less  computer  time  than  the  maximum  likelihood  ;  rocodi 
Tiie  time  taken  for  the  maximum  likelihood  procedure,  ..specially  with 
large  numbers  of  items  and  examinees  may  become  forbidding,  enough  to 
warrant  the  use  of  urry  procedure  in  this  situation.  It  should  be 
noted,  in  fairness  to  the  maximum  likelihood  procedure,  the  Urry 
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proccdure,  in  general,  deletes  core  items  and  exam i. ices  daring  the 
estimation  than  the  maximum  likelihood  procedure.  This  may  explain 
the  rapidity  of  convergence  and  indicate  a  weakness  in  the  Urry  pro¬ 
cedure. 

The  bias  and  consistency  results  indicate  that  for  small  numbers 
of  items,  the  estimates  of  the  item  and  ability  parameters  are  biased, 
with  the  Urry  estimates  being  more  biased  than  the  maximum  likelihood 
estimates.  As  the  number  of  examinees  and  the  number  of  items  increase, 
it  appears  that  the  estimators  are  unbiased,  and  in  fact,  are  consistent. 
This  in  a  sense  supports  a  conjecture  of  Lord  (1968)  and  shows  that 
the  three-parameter  model  may  be  statistically  viable. 
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